Examining the Consistency of Evaluations Provided by Three Automatic Speech Recognition Systems
نویسنده
چکیده
In an EFL setting like Taiwan, it is often difficult to provide individual oral language training due to limited human resources. Recently, several innovative commercial CALL programs (e.g., TeLLmeMore and MyET) claim that they can provide high-quality pronunciation training with their automatic speech recognition technologies. Although these programs can provide scores and feedback to individual learners, many teachers and students are not sure if the evaluations provided by these systems are fair and consistent. In this study, the consistency of evaluations provided by three different ASR programs (TeLLmeMore, MyET, and Microsoft) was examined. Four different groups of college students were asked to interact with these different ASR systems. The subjects read the same 20 sentences to each system and then received scores and feedback from these three systems. The whole interaction processes and subjects’ performance and scores were recorded by Camtasia, a screen and audio capture tool. The audio files of the subjects’ performance were also evaluated by two human raters. Based on the scores assigned by human raters and ASR programs, strong correlations were found between human raters and different ASR scores. Moreover, strong correlations were also found among different ASR systems. These statistical results show that automatic speech recognition technologies can indeed assign consistent scores to students at different proficiency levels. Although at this stage these ASR systems might not be suitable for high-stake tests like TOEFL or college entrance examinations, these tools can be used for pedagogical interventions and low-stake tests like placement tests or diagnostic tests.
منابع مشابه
A Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation
Abstract Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...
متن کاملDesigning and implementing a system for Automatic recognition of Persian letters by Lip-reading using image processing methods
For many years, speech has been the most natural and efficient means of information exchange for human beings. With the advancement of technology and the prevalence of computer usage, the design and production of speech recognition systems have been considered by researchers. Among this, lip-reading techniques encountered with many challenges for speech recognition, that one of the challenges b...
متن کاملPersian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods
Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...
متن کاملبهبود عملکرد سیستم بازشناسی گفتار پیوسته بوسیله ویژگیهای استخراج شده از مانیفولدهای گفتاری در فضای بازسازی شده فاز
The design for new feature extraction methods out of the speech signal and combination of their obtained information is one of the most effective approaches to improve the performance of automatic speech recognition (ASR) system. Recent researches have been shown that the speech signal contains nonlinear and chaotic properties, but the effects of these properties are not used in the continuous ...
متن کاملتخمین سریع ضرایب پیچش در هنجارسازی طول مجرای صوتی با استفاده از امتیاز به دست آمده از مدلسازی تشخیص جنسیت
The performance of automatic speech recognition (ASR) systems is adversely affected by the variations in speakers, audio channels and environmental conditions. Making these systems robust to these variations is still a big challenge. One of the main sources of variations in the speakers is the differences between their Vocal Tract Length (VTL). Vocal Tract Length Normalization (VTLN) is an effe...
متن کامل